大规模并行处理器编程：实践导向：GPU计算的起源

GPU的诞生是一次根本性的突破，其驱动力是 “实时性要求”：在$1/60^{th}$秒（16.67毫秒）的时间窗口内渲染复杂3D场景的不可妥协要求。尽管CPU遵循了多核发展路径以低延迟串行执行优化的路径，但随着分辨率提升，它们遇到了瓶颈。

20世纪90年代中期，游戏行业陷入危机。一个串行处理的CPU，在处理人工智能和物理运算时，无法快速计算出数百万个像素值以维持流畅画面。这迫使人们必须开发专用硬件来卸载重复性的 图形流水线。

在内部并行数组出现之前，3dfx公司推出了 扫描线交错（SLI）。通过使用两块独立显卡交替计算水平扫描线，整个行业将关注点从单线程速度转向了纯粹的“暴力算力”吞吐量。

GPU的诞生优先考虑为简单的算术单元分配硅面积，而非复杂的分支预测。这种“宽而慢”的设计理念，使GPU能够处理三角形计算中的重复性数学任务，而CPU则专注于非并行逻辑。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the specific 'time budget' required for 60 frames per second (FPS)?

33.33ms

16.67ms

10.00ms

100.00ms

QUESTION 2

How did 3dfx's SLI achieve early parallelism in consumer hardware?

By increasing the clock speed of a single chip.

By having two cards render alternating horizontal scan lines.

By sharing AI logic between the GPU and CPU.

By reducing the resolution of the frame.

QUESTION 3

Why did the GPU diverge from the standard multicore trajectory of CPUs?

GPUs needed deeper caches for complex branching.

GPUs prioritize throughput of simple math over low-latency serial logic.

CPUs became too expensive to manufacture for 3D graphics.

GPU architectures were designed to be smaller than CPUs.

QUESTION 4

In the context of 1990s gaming, what was the 'Real-Time Imperative'?

The requirement to run physics simulations on the GPU.

Processing millions of pixels within the strict frame window.

The transition from 16-bit to 32-bit computing.

Allowing the CPU to handle rasterization.

QUESTION 5

What is meant by the GPU's 'Wide and Slow' philosophy?

Using many simple processors at lower clock speeds to do massive work.

Designing physically wide chips that take longer to process data.

A design that favors high latency but high memory capacity.

Optimizing for single-threaded serial logic.